Exercise: Relying on Composition

In this exercise, you'll work with composition and abstraction. The goal of the exercise is to improve code that is used by a famous YouTuber to manage backing up his video files.

The code below is part of a script that this famous YouTuber uses to search through video backups, which are stored in a bucket (= file storage location) in the cloud:

class ACCloud(CloudProvider):
  def __init__(self, bucket_name: str, region: str) -> None:
    authentication = GoogleAuth("service_key.json")
    super().__init__(
      region=region,
      http_auth=authentication,
      secure=True,
    )
    self.bucket_name = bucket_name

  def find_files(
    self,
    query: str,
    max_result: int
  ) -> list[str]:
    response = self.filter_by_query(
      bucket=self.bucket_name,
      query=query,
      max=max_result
    )
    return response["result"]["data"][0]

class VideoStorage(ACCloud):
  def __init__(self) -> None:
    super().__init__(
      bucket_name="video-backup.arjancodes.com",
      region="eu-west-1c",
    )

The main (fake) dependency is the cloudengine package which contains the CloudProvider abstract base class.

The classes in this example are nested, in other words multiple levels of inheritance. VideoStorage. inherits from ACCloud which in turn inherits CloudProvider.

The problem with these layers of inheritance is that the code becomes hard to read.

a) Refactor this code so that it no longer uses inheritance, but relies on composition instead.

b) Refactor the code once more so that find_files is no longer dependent on the cloudengine package.

Compatible Python Versions: 3.9+


Alexandre Gourmelon

Hi Arjan, Can you review my code please? Thanks.

from dataclasses import dataclass
from functools import partial
from typing import Protocol, TypeAlias, Sequence

QueryResponse: TypeAlias = dict[str, dict[str, Sequence[list[str]]]]

class CloudProvider(Protocol):

def filter_by_query(bucket: str, query: str, max: int) -> QueryResponse:
...

@dataclass
class ACCloud:
bucket_name: str
region: str
cloud_provider: CloudProvider

def find_files(self, query: str, max_result: int) -> list[str]:
response = self.cloud_provider.filter_by_query(
bucket=self.bucket_name, query=query, max=max_result
)
return response["result"]["data"][0]

VideoStorage: type[ACCloud] = partial(
ACCloud,
bucket_name="video-backup.arjancodes.com",
region="eu-west-1c",
)

By the way, I have a doubt about `return response["result"]["data"][0]`, where I thinking if it might be better to abstract it: ACCloud should not be have such a deep knowledge about the reponse object? What do you think? How would you do?

REPLY
Arjan Egges

Hi Alexandre, indeed, I would put retrieving the data in the cloud provider implementation. It's likely that the response you get from Amazon vs Google or Microsoft is going to have a different structure, so it makes sense to extract the data in the communication layer (for example in GoogleCloudProvider or AWSCloudProvider).

REPLY
Kervi Ramos

Would having the line of code `return response["result"]["data"][0]` in ACCloud instead of CloudProvider violate the Law of Demeter?

REPLY
Andreas [ArjanCodes Team]

Yes, it would. In this case, a possible solution would be to encapsulate the access with a method in ACCloud

REPLY
Agustin Rodriguez

Hi arjan and team.
This is my solution

class ACCloud(Protocol):
""" cloud storage inteface """

def find_files(self, query: str, max_result: int) -> list[str]:
...

@dataclass
class VideoStorage:
""" video storage class """
bucket_name="video-backup.arjancodes.com",
region="eu-west-1c",
CloudProvider = CloudProvider

def find_files(self, query: str, max_result: int) -> list[str]:
""" find files in the cloud """
response = self.CloudProvider.filter_by_query(
bucket=self.bucket_name, query=query, max=max_result
)
return response["result"]["data"][0]

REPLY
Andreas [ArjanCodes Team]

Nice start! Good job with removing the inheritance. However, there are some problems with the solution.

* How come the attributes of the dataclass are comma-separated?
* The attribute CloudProvider is a bit misleading, it looks like it is both an attribute but also a reference to the class object.
* Could you add an example where the ACCloud is used within the solution?

REPLY
Agustin Rodriguez

Yes, I understood.

How come the attributes of the dataclass are comma-separated? That was my error.
The attribute CloudProvider is a bit misleading; it looks like it is both an attribute and a reference to the class object. Yes, it is confusing. I will fix this.
Could you add an example where ACCloud is used within the solution? This point isn't clear to me; I am confused with the class names. But here is my solution.
class ACCloud(Protocol):
""" cloud storage inteface """

def find_files(self, query: str, max_result: int) -> list[str]:
""" find files in the cloud """

@dataclass
class VideoStorage:
""" video storage class """
bucket_name: str
provider: CloudProvider

def find_files(self, query: str, max_result: int) -> list[str]:
""" find files in the cloud """
response = self.provider.filter_by_query(
bucket=self.bucket_name, query=query, max=max_result
)
return response["result"]["data"][0]

def main():
""" main function """
authentication = GoogleAuth("service_key.json")
region="eu-west-1c"
provider = CloudProvider(
region=region,
http_auth=authentication,
secure=True,
)
bucket_name="video-backup.arjancodes.com"
storage = VideoStorage(bucket_name=bucket_name, provider=provider)
storage.find_files("video", 10)

REPLY
Andreas [ArjanCodes Team]

Currently, there is a protocol ACCloud that is defined, but it is not used anywhere. The purpose of the protocol is to enforce a certain structure.

For example, if you have a function definition like def get_files_in_storage(storage: ACCloud). Then, that means objects that adhere to the structure of ACCloud (In other words, have the find_files method implemented) can be passed as an parameter to that function.

REPLY
Jorge Alvarado

What do you think about this solution? I removed all "external" dependencies:

from dataclasses import dataclass
from typing import Any, Protocol, Type

QueryResult = dict[str, dict[str, list[str]]]

class CloudProviderContract(Protocol):
def __init__(self, region: str, http_auth: Any, secure: bool): ...

def filter_by_query(self, bucket: str, query: str, max: int) -> QueryResult: ...

@dataclass
class ACCloud:
cloud_provider: CloudProviderContract
bucket_name: str

def find_files(self, query: str, max_result: int) -> list[str]:
response = self.cloud_provider.filter_by_query(
bucket=self.bucket_name, query=query, max=max_result
)
return response["result"]["data"]

def create_video_storage(
authentication: Any,
provider_contract_class: Type[CloudProviderContract],
bucket_name: str = "video-backup.arjancodes.com",
region: str = "eu-west-1c",
) -> ACCloud:
provider = provider_contract_class(
region=region, http_auth=authentication, secure=True
)
ac_cloud = ACCloud(cloud_provider=provider, bucket_name=bucket_name)
return ac_cloud

REPLY
Andreas [ArjanCodes Team]

Looks good! There is no inheritance and it relies on composition. Furthermore, the find_files is not dependent on the cloudengine package.

REPLY
Luca Baldini

I found I bit unclear this exercise. The reason is that CloudProvider is not an abstract class.
I found reasonable to use instance of it to define the provider and achieve data composition with a functional approach. That is, I only provided the find file function that inspect a CloudProvider filter result according to the necessary parameters. I hope I understand the exercise correctly.

# pylint:disable=C0116

from cloudengine.google import GoogleAuth
from cloudengine.provider import CloudProvider

def find_files(
provider: CloudProvider, query: str, bucket: str, max_result: int
) -> list[str]:
response = provider.filter_by_query(bucket=bucket, query=query, max=max_result)
return response["result"]["data"][0]

if __name__ == "__main__":
google_provider = CloudProvider(
region="eu-west-1c", http_auth=GoogleAuth("service_key.json"), secure=True
)

files_on_google = find_files(
google_provider, query="my_query", bucket="google_bucket", max_result=100
)

files_on_video_storage = find_files(
google_provider,
query="my_query",
bucket="video-backup.arjancodes.com",
max_result=100,
)

REPLY
Arjan Egges

Hi Luca, that is exactly the idea! To improve this even more, you could use a Protocol to provide an abstraction layer between the CloudProvider type and what the other functions expect (you can find an example of this in the second part of the solution)

REPLY
Show More